Hartigan's K-Means Versus Lloyd's K-Means - Is It Time for a Change?
نویسندگان
چکیده
Hartigan’s method for k-means clustering holds several potential advantages compared to the classical and prevalent optimization heuristic known as Lloyd’s algorithm. E.g., it was recently shown that the set of local minima of Hartigan’s algorithm is a subset of those of Lloyd’s method. We develop a closed-form expression that allows to establish Hartigan’s method for k-means clustering with any Bregman divergence, and further strengthen the case of preferring Hartigan’s algorithm over Lloyd’s algorithm. Specifically, we characterize a range of problems with various noise levels of the inputs, for which any random partition represents a local minimum for Lloyd’s algorithm, while Hartigan’s algorithm easily converges to the correct solution. Extensive experiments on synthetic and real-world data further support our theoretical analysis.
منابع مشابه
Hartigan's Method: k-means Clustering without Voronoi
Hartigan’s method for k-means clustering is the following greedy heuristic: select a point, and optimally reassign it. This paper develops two other formulations of the heuristic, one leading to a number of consistency properties, the other showing that the data partition is always quite separated from the induced Voronoi partition. A characterization of the volume of this separation is provide...
متن کاملProposing an approach to calculate headway intervals to improve bus fleet scheduling using a data mining algorithm
The growth of AVL (Automatic Vehicle Location) systems leads to huge amount of data about different parts of bus fleet (buses, stations, passenger, etc.) which is very useful to improve bus fleet efficiency. In addition, by processing fleet and passengers’ historical data it is possible to detect passenger’s behavioral patterns in different parts of the day and to use it in order to improve fle...
متن کاملPersistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm
Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...
متن کاملComparing Model-based Versus K-means Clustering for the Planar Shapes
In some fields, there is an interest in distinguishing different geometrical objects from each other. A field of research that studies the objects from a statistical point of view, provided they are invariant under translation, rotation and scaling effects, is known as the statistical shape analysis. Having some objects that are registered using key points on the outline...
متن کاملThe Analysis of a Simple k - Means Clustering
K-means clustering is a very popular clustering technique which is used in numerous applications. Given a set of n data points in R d and an integer k, the problem is to determine a set of k points R d , called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper we present...
متن کامل